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CHAPTER 3 



The Structure of the Genome 

John S. Sussenbach 



I. INTRODUCTION 

Adenovirus particles have a highly ordered structure and are composed 
of protein and DNA. Human adenoviruses contain about 87% protein 
and 13% DNA (Green and Pina, 1963), while the larger avian chick em- 
bryo lethal orphan (CELO) virus consists of 83% protein and 17% DNA 
(Laver et al., 1971). In virions, the viral DNA is tightly associated with 
several virus-coded proteins. Disruption of virions with acetone, urea, or 
pyridine, or repeated freezing and thawing, releases the viral cores, which, 
in addition to the viral DNA, still contain about 18-20% of the total 
protein of the virions (Laver et al., 1967, 1968; Maizel et al., 1968; Prage 
et al., 1968, 1970). The proteins found in viral cores are mainly two basic 
polypeptides. The major core protein is identical to polypeptide VII [mo- 
lecular weight 18,000 (18K)], of which about 1000 copies are present in 
each viral particle. The minor core protein is polypeptide V (molecular 
weight 45. 5K), of which each virion contains about 200 copies (Laver et 
al., 1968; Prage et al., 1968, 1970; Prage and Pettersson, 1971; Russell et 
al., 1971; Everitt et al., 1973; Laver, 1970). However, when cores are 
prepared by extraction of virions with sarkosyl, only polypeptide VH is 
found associated with the DNA (Brown et al., 1975). The different protein 
compositions of pyridine and sarkosyl cores suggest that polypeptide VII 
is more intimately associated with the viral genome than is polypeptide 
V. 

Corden et al. (1976) concluded that adenovirus DNA packed in vi- 
rions has a chromatinlike structure. They found that digestion of dis- 
rupted virions with micrococcal nuclease cleaves the viral genome into 
fragments about 200 nucleotides long. However, these experiments could 
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not be repeated by Tate and Philipson (1979). Mirza and Weber (1982) 
proposed that although adenovirus DNA is indeed packed into subumts, 
its organization in the virion is not completely the same as that of eu- 
karyotic chromatin. Partial deoxyribonuclease (DNase) digestion of eu- 
karyotic chromatin leads to stretches of DNA with a length of 200 nu- 
cleotide pairs associated with histones. Mirza and Weber (1982) found 
that viral chromatin does indeed have a nucleosomelike structure, but 
that partial DNase digestion yields monomers of about 150 nucleotide 
pairs of DNA wrapped around three dimers of polypeptide VH. These 
monomers are linked by a variable length of DNA associated with one 
copy of polypeptide V. 

Since adenovirus DNA is tightly associated with virion proteins, pro- 
tein-free DNA can be obtained only by extensive digestion of virions or 
viral cores with proteolytic enzymes (papain, pronase, or proteinase K) 
followed by sodium dodecyl sulfate (SDS)-phenol extraction (van der Eb 
and van Kesteren, 1966; Green et al., 1967; van der Eb et al., 1969; Laver 
et al., 1971). The DNA thus isolated has a linear structure and has been 
characterized in great detail. 

An alternative isolation procedure for adenovirus DNA was first ap- 
plied by Bellett and co-workers for CELO and adenovirus type 2 (Ad2) 
DNA (Robinson et al., 1973; Robinson and BeUett, 1975a). These inves- 
tigators isolated DNA in the absence of proteolytic enzymes, employing 
an extraction with 4 M guanidinium hydrochloride. The isolated DNA 
has in the electron microscope (EM) a circular structure, which can be 
converted into a linear configuration by digestion of the preparation with 
proteolytic enzymes (Robinson et al., 1973). Similar studies have also 
been performed for Ad5 DNA (Keegstra et al., 1977). The sensitivity of 
the circular structures for proteolytic enzymes suggests that the circular 
structures are maintained by a protein linker. 

By in vitro labeling of the protein moiety with 125 I, it could be dem- 
onstrated that a polypeptide with a molecular weight of 55K is covalently 
attached to the 5' end of each DNA strand (Rekosh et al., 1977). This 
protein, designated terminal protein, has a hydrophobic character, which 
facilitates joining of the ends of the DNA-protein complexes, resulting 
in the formation of circular structures and concatemers. The properties 
of the linear deproteinized DNA as well as the characteristics of the 
circular DNA-protein complexes are discussed in more detail in the fol- 
lowing sections. 



E. GROUPING OF ADENOVIRUSES BASED ON DNA 
HOMOLOGY 

The different human adenoviruses have been classified into 
subgroups on the basis of different criteria. Rosen (1960) originally pro- 
posed three subgroups based on differences in hemagglutinating capacity. 
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Hierholzer (1973) extended this classification system to ten subgroups. 
On the basis of the apparent molecular weights of virion polypeptides V, 
VI, and VH, Wadell (1978) arranged 20 human serotypes into five groups. 
A completely different type of classification is based on the oncogenicity 
of the human adenoviruses. The different serotypes have been subdivided 
into a highly oncogenic subgroup A (Adl2, Adl8, Ad31), a weakly on- 
cogenic subgroup B (e.g., Ad3 and Ad7), and a nononcogenic subgroup C 
(e.g., Ad2 and Ad5) (Trentin et al., 1962; Girardi et al., 1964; Huebner et 
al., 1962, 1965; Larson et al., 1965; Pereira et al., 1965; Green, 1970). It 
is interesting to note that there is a correlation between the guanine- 
cytosine (GC) content of the human adenovirus DNAs and the oncogen- 
icity of the viruses. The GC content of the DNAs decreases with in- 
creasing oncogenicity (Pina and Green, 1965) (Table I). Probably this cor- 
relation has no physiological basis, since, in contrast to the human 
adenoviruses, the oncogenic simian adenoviruses tend to have slightly 
higher GC contents than the nononcogenic adenoviruses (Goodhearst, 
1971). Further, the oncogenic simian serotypes have GC contents that 
are in general higher than those of the nononcogenic human serotypes. 

The most meaningful and fundamental way to group adenoviruses 
is based on DNA sequence homology. Fortunately, the DNA homology 
grouping is in agreement with other groupings of human adenoviruses 
on the basis of oncogenicity, GC content, and molecular characteristics 
of viral proteins (Table I). Originally, Green et al. (1970) determined the 
homology among different DNAs employing filter hybridization. Re- 
cently, the classification was improved by employment of liquid-phase 
molecular hybridization with in vitro-labeled viral DNA. A total of 31 
different human adenovirus serotypes were divided into five different 
subgroups, A-E (Green et al., 1979b). In general, members of the same 
subgroup have genomes that are homologous for more than 90% . How- 
ever, members of subgroup A share only 48-69% of their DNA sequences. 
The homology among members of different subgroups is less than 20% 
(Table I). 

The major regions of least homology among DNAs of different 
human serotypes have been visualized by heteroduplex mapping (Garon 
et al., 1973). Heteroduplexes of subgroups B and C DNAs contain two 
major regions of heterology located at positions 50-65 and 78-91 on the 
adenovirus genome map. Heteroduplexes of members of subgroup A show 
a more complex distribution of homologous and heterologous regions. 
However, in this case, too, heterology is found at the two positions men- 
tioned above. 

Using the single-strand specific endonuclease from Neuiospora 
cmssa, Bartok et al. (1974) were able to digest specifically the heterolo- 
gous regions from heteroduplexes of Ad2 and Ad5 DNA and obtained 
three specific fragments, in agreement with the heteroduplex mapping. 
The heterologous regions contain the genetic information of the major 
coat proteins hexon and fiber, which play an important role in the se- 
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rological classification of the different adenovirus serotypes. In addition, 
one of the heterologous regions codes for a group of nonvirion early pro- 
teins (see Section VII). 



HI. PHYSICOCHEMICAL PROPERTIES OF ADENOVIRUS 
DNA. 



DNA, extracted from adenovirus particles employing digestion with 
proteolytic enzymes, has a linear double-stranded structure (van der Eb 
and van Kesteren, 1966; Green et al., 1967; van der Eb et al., 1969- Youn- 
ghusband and Bellett, 1971). The size of the viral genome varies from 
serotype to serotype. The molecular weights of the human adenovirus 
DNAs range from 19-22 x 10 6 for the highly oncogenic serotypes Adl2 
Adl8, and Ad31 to 23-24 x 10 6 for the nononcogenic serotypes Adl' 
Ad2, and Ad5 (Green et al., 1967) (Table I). On the basis of nucleotide 
sequence data and the sum of restriction fragments, it has been inferred 
that the genome of Ad2 and Ad5 is about 36,000 nucleotide pahs and 
that Adl2 DNA is 34,300 nucleotide pahs long. The sizes of the genomes 
of nonhuman serotypes are comparable to those of their human coun- 
terparts [that of mouse serotype FL DNA being 20.7 x 10 6 (Temple et 
al„ 1981) and of simian adenovirus SA7 DNA being 22 x 10 6 (Burnett 
and Harrington, 1968)]. On the other hand, the genome of the avian chick 
embryo lethal orphan (CELO) virus is much larger, measuring 30 x 10 6 
(Younghusband and Bellett, 1971; Laver et al., 1971). 

When native adenovirus DNA is digested with Escherichia coli ex- 
onuclease m and is subsequently examined under the EM, no circulari- 
zation of the linear genome is observed, indicating that adenovirus DNA 
is not terminally redundant as T7 DNA (Green et al., 1967; Younghus- 
band and Bellett, 1971). On the other hand, when double-stranded DNA 
(dsDNA) is denatured and reannealed at low DNA concentrations, both 
strands of human as well as of avian adenovirus DNA are able to form 
single-stranded circles (Garon et al., 1972; Wolfson and Dressier 1972- 
Robinson and Bellett, 1975b). The formation of single-stranded 'circles 
mdicates that adenovirus DNA contains an inverted terminal repetition. 
This mverted terminal repetition is discussed in more detail in Section 

The distribution of adenine-thymine (AT) and GC base pahs in ad- 
enovirus DNA has been investigated by partial thermal denaturation 
mapping. The unique thermal denaturation patterns of DNAs from Ad2 
Ad5, and Adl2, the avian CELO virus, and the mouse strain FL indicate 
that adenovirus DNA is not circularly permuted as T7 DNA, but that all 
DNA molecules from the same serotype have an identical nucleotide 
sequence (Doerfler and Kleinschmidt, 1970; Younghusband and Bellett 
1971; Doerfler et al., 1972; Ellens et al., 1974; Temple et al., 1981) In 
most denaturation patterns, the distribution of AT and GC base pahs 
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along the DNA molecule is asymmetrical. By convention, the AT-nch 
half of an adenovirus DNA molecule has been designated the right-hand 
half of the molecule (Doerfler and Kleinschmidt, 1970). In some cases 
(Ad2 and Ad5), the AT- and GC-rich halves of the DNA molecules can 
be separated by CsCl or HgCl 2 -Cs 2 S0 4 gradient centrifugation of sheared 
DNA (Kimes and Green, 1970; Doerfler and Kleinschmidt, 1970; Horwitz, 
1974; Graham et al., 1974b). However, due to the more even distribution 
of AT and GC base pairs in Adl2 DNA, separation of the left and right 
halves of Adl2 DNA by this procedure is not possible (Doerfler et al„ 
1972). 

Separation of the complementary strands of adenovirus DNA can be 
performed by complexing of the single strands of denatured native DNA 
with poly(I:G) or poly(U:G). Intact complementary strands have been 
obtained for Ad2, Ad5, Ad7, and Adl2 DNA (Kubinski and Rose, 1967; 
Landgraf-Leurs and Green, 1971; Patch et al., 1972; Tibbetts et al., 1974; 
Vlak et al., 1975). Since the two complementary strands bind unequal 
amounts of the copolymers, the two strands can be separated by equilib- 
rium density-gradient centrifugation or by gel electrophoresis (Goldbach 
et al., 1978). Complementary strands of Ad2 and Ad5 DNA have also 
been separated by alkaline CsCl equilibrium density-gradient centrifu- 
gation (Sussenbach et al, 1973; Sharp et al., 1975). The buoyant densities 
of the two strands in alkaline CsCl differ by 2-4 mg/ml, which is suf- 
ficient for separation. The heavy strands of Ad2 and Ad5 DNA obtamed 
by poly(U:G)-CsCl gradient centrifugation have the lower density in 
alkaline CsCl (Tibbetts et al., 1974; Vlak et al., 1975). 

Tibbetts et al., (1973) showed that Ad2 single-stranded DNA (ssDNA) 
is retained by hydroxyapatite columns under conditions generally used 
for selective retention of dsDNA, probably due to partialy complementary 
regions in the single strands. Other indications for regions of comple- 
mentarity in adenovirus ssDNA were obtained by EM. Under suitable 
conditions, an extended region of secondary structure is observed at po- 
sition 73 on the conventional adenovirus map (Wu et al, 1977). Regions 
that contain complementary sequences were also detected at the molec- 
ular termini (Padmanabhan and Green, 1976; Wu et al., 1977). Digestion 
of native Ad2 DNA with exonuclease HI followed by repair synthesis of 
the exposed single-stranded ends with DNA polymerase I revealed the 
presence of self-complementary sequences about 50 nucleotides long, lo- 
cated at a distance of about 180 nucleotides from each molecular end 
(Padmanabhan and Green, 1976). Nucleotide sequence analysis of the 
termini confirmed the existence of self-complementary sequences m 
these regions. 

IV. COORDINATE SYSTEM 

To come to an unambiguous nomenclature for the two complemen- 
tary strands of adenovirus DNA, it has been proposed to adopt a nomen- 
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clature that is based on the direction of transcription, rather than on 
physical properties, e.g., densities. By convention, the AT-rich half of the 
DNA molecule is oriented to the right and the strand transcribed to the 
right is called the r-strand, while the leftward-transcribed strand is des- 
ignated the 1-strand.* The r-strand appears to be identical to the strand 
with the higher density in alkaline CsCl and to the strand with lower 
density in poly(U: G)-CsCl (see the proposal in /. Vhol. 22:830, 1977). 
Further, it is agreed to divide the adenovirus DNA into 100 map units 
(m.u.) from left to right on the viral genome. 

The agreement on a unique orientation of adenovirus DNA molecules 
formed the basis for an unambiguous mapping of significant landmarks 
on the adenovirus genome. With the discovery and the purification of 
restriction endonucleases, powerful tools became available to dissect the 
adenovirus genome in distinct specific fragments (for a review of available 
enzymes, see Roberts, 1981). These fragments have been used to unravel 
the organization of the adenovirus genome in detail. For many adenovirus 
serotypes, accurate restriction endonuclease cleavage maps of the viral 
genome are available, and with the increasing knowledge of the nucleo- 
tide sequences of several adenovirus DNAs, this number is still growing. 
A summary of restriction endonuclease cleavage maps is presented in 
Appendix A. 

Many restriction fragments have been inserted into prokaryotic plas- 
mids employing recombinant DNA techniques (Stenlund et al., 1980). 
These adenovirus DNA-containing plasmids are very useful for obtaining 
large amounts of specific fragments, especially of poorly growing sero- 
types. They have frequently been used for nucleotide sequence analysis 
and site-directed mutagenesis. The two complementary strands of re- 
striction fragments have been separated by annealing denatured frag- 
ments in the presence of an excess of one of the intact complementary 
strands followed by separation of the partial duplex and the remaining 
single strand. Strand separation has also been obtained by gel electro- 
phoresis of denatured restriction fragments (Tibbetts and Pettersson, 
1974; Sharp et al., 1975; Sussenbach et al., 1973; Goldbach et al., 1978). 
These single strands have frequently been used to isolate specific mes- 
senger RNA (mRNA) species. 

The most detailed information on the structure of the adenovirus 
genome and the positions of important landmarks became available by 
nucleotide sequence analysis of DNAs from different adenovirus sero- 
types (see Appendix B). The most extended sequences have been estab- 
lished for Ad2 DNA, of which about 70% has been sequenced (Arrand 
and Roberts, 1979; Zain and Roberts, 1979; Zain et al., 1979a,b; Shina- 
gawa and Padmanabhan, 1979; Galibert et al., 1979; Akusjarvi and Pet- 
tersson, 1978a,b, 1979a,b ; Herisse et al., 1980, 1981; Akusjarvi et al., 

It should be noted that r-strand transcripts are equivalent to 1-strand DNA sequences and 
that 1-strand transcripts are homologous to r-strand sequences. 
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1980, 1981; Shinagawa et al., 1980; Herisse and Galibert, 1981; Alestrom 
et al., 1980, 1982; Akusjarvi and Persson, 1981a; Kruijer et al., 1982; 
Gingeras et al., 1982). This allows the positioning of many landmarks on 
the Ad2 genome at the nucleotide level. Comparison of the Ad2 nucleo- 
tide sequence and the restriction maps revealed that the nucleotide equiv- 
alent of 1% of the genome depends on the particular location on the Ad2 
genome (Gingeras et al., 1982). It was derived that a value of 365 nu- 
cleotides for 1% gives the best fit for the left end, while a value of 357 
nucleotides for 1% is the best fit for the right end. The differences in 
nucleotide equivalent for 1% are probably caused by the differences in 
nucleotide composition between the right and left halves of the Ad2 gen- 
ome. 



V. INVERTED TERMINAL REPETITION 

The existence of an inverted terminal repetition (ITR) in adenovirus 
DNA was discovered when denatured DNA was reannealed at low con- 
centrations and examined under the EM A high percentage of the single 
strands were present in a circular form, indicating that adenoviral DNA 
contains an ITR (Garon et al., 1972; Wolfson and Dressier, 1972). So far, 
ITRs have been detected in every serotype investigated, although the 
length of the repetitions may vary (Table I). The general occurrence of 
an ITR in adenovirus DNA suggests very strongly that this feature plays 
an important role in viral propagation. 

The single-stranded circular structures have a rather high thermal 
stability, which is consistent with a highly ordered base-pairing between 
the terminal sequences (Garon et al., 1972; Wolfson and Dressier, 1972). 
It also suggests that the ITRs must be of considerable length. Circular- 
ization of adenovirus ssDNA can be abolished by digestion with exo- 
nuclease HI, and this treatment has been used to estimate the size of the 
terminal repetitions. Garon et al. (1972) concluded that the length of the 
terminal repetition ranged from 350 base pairs (bp) for Ad2 to 1400 bp 
for Ad31. However, since inverted repeats of these sizes can be visualized 
under the EM and no double-stranded regions were detected in the single- 
stranded circles, it was concluded that the exonuclease IH experiments 
obviously lead to an overestimation of the lengths of the ITRs. An ex- 
ceptionally long ITR was detected in Adl8 DNA (Garon et al., 1975). In 
single-stranded circles of this serotype, a double-stranded panhandle with 
a mean length of 0.31 |xm was seen, equivalent to 3% of the genome 
length. 

A more accurate estimate of the size of the ITR of Ad2 DNA was 
obtained by restriction enzyme analysis of end-labeled DNA. When a 
restriction enzyme cleaves within the repeated sequence, both molecular 
ends will yield a fragment of the same size, while cleavage outside the 
repeated sequence will yield fragments of different size. Employing this 
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approach, Roberts et al. (1974) estimated that the terminal repetition of 
Ad2 DNA is between 100 and 140 nucleotides long (also see Arrand et 
al., 1975). 

Recently, nucleotide sequence analysis has been used to determine 
exactly the size and composition of several adenovirus serotypes (Ap- 
pendix B). Some general features of the adenovirus ITRs can be demon- 
strated in the ITR of Ad5 DNA, the first sequenced repetition. The ITR 
of Ad5 is 103 bp long (Steenbergh et al., 1977). Its sequence is unique and 
does not contain extended self-complementary regions. A striking prop- 
erty of the Ad5 terminal repetition is the asymmetrical distribution of 
GC and AT base pairs. The first 50 bp contain 72% AT, while the next 
50 bp have only 27% AT. Although the lengths of inverted repeats of 
other serotypes may differ considerably, they all show the same asym- 
metrical distribution of base pairs. As for a function of this property, it 
is not unlikely that the high AT content of the first hah of terminal 
repetitions is of relevance for a rapid unwinding of the molecular ends 
during initiation of DNA replication. 

Comparison of the inverted repetitions of serotypes from the same 
subgroup shows a high degree of homology (see Appendix B). The repe- 
titions of Ad2 and Ad5 both have a length of 103 bp and are completely 
identical (Steenbergh et al., 1977; Shinagawa and Padmanabhan, 1979), 
although the repetition of a particular Ad2 strain has been described that 
is 102 bp long (Arrand and Roberts, 1979). The terminal repetitions of 
Ad3 and Ad7 strain Greider both have a length of 136 bp and differ at 7 
positions (Tolun et al., 1979; Shinagawa and Padmanabhan, 1980). Com- 
parison of two Ad7 strains (Greider and Gomen) reveals that both repeats 
are 136 bp long but differ at 5 positions (Dijkema and Dekker, 1979; 
Shinagawa and Padmanabhan, 1980). Similar strain differences have also 
been found for Adl2. The length of the Adl2 ITR varies between 162 
(Shinagawa and Padmanabhan, 1980) and 164 bp (Sugisaki et al., 1980; 
Schwarz et al., 1982). In all ITRs determined except one, a dCMP residue 
has been found at the 5' ends of adenovirus DNA. The exception is chick 
enbryo lethal orphan (CELO) DNA, which has at its 5' end a dGMP res- 
idue (Alestrom et al., 1982a). In the ITRs of all human adenovirus DNAs, 
the sequence ATAATATACCTTAT (nucleotides 9-22) is present (Tolun 
et al., 1979); the regions of the inverted repetitions beyond nucleotide 50 
show a low degree of homology, although in all serotypes an asymmetrical 
distribution of base pairs is found. Comparison of the DNAs of the human 
serotypes with mouse strain FL DNA (Temple et al., 1981) reveals that 
they have the sequence ATAATATAC (nucleotides 9-17) in common, 
while the homologous region between human adenovirus DNAs and 
CELO DNA is located between positions 9 and 15 (ATAATAT) (Alestrom, 
et al., 1982a). It is very likely that the conserved sequences 9-15 and 9- 
1 7 play a crucial role in the initiation of DNA replication and are probably 
involved in recognition of the site of initiation by the precursor of the 
terminal protein. In this respect, it is interesting to note that mouse 
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adenovirus strain FL DNA can be replicated in an in vitro DNA repli- 
cation system of Ad2 DNA (Temple et al., 1981). Shinagawa and Pad- 
manabhan (1980) have pointed out that in Ad2 ; Ad3, Ad5, Ad7, and Adl2 
DNA ; an additional region of interesting homology is present. In these 
serotypes, the hexanucleotide TGACGT is found at or near the site where 
the sequences beyond the ITR begin to diverge. The function of this ho- 
mology is unknown. 



VI. TERMINAL PROTEIN 

The presence of protein at the termini of adenovirus DNA was orig- 
inally detected by Bellett and co-workers, employing DNA isolation pro- 
cedures that avoid proteolytic digestion (Robinson et al., 1973; Robinson 
and Bellett, 1975a). These investigators observed that the DNA-protein 
complex obtained is resistant to boiling and treatment with SDS, indi- 
cating that the protein is probably covalently linked to the DNA (Ro- 
binson et al., 1973; Sharp et al., 1976; Carusi, 1977; Padmanabhan and 
Padmanabhan, 1977). 

When the buoyant densities of Ad2 and Ad5 DNA-protein complexes 
are compared with the densities of the corresponding DNAs isolated by 
digestion with pronase, a small difference of 2-10 mg/ml is found. This 
corresponds to an amount of protein present in the DNA-protein complex 
of a maximal 0.3% of the total virion protein (Robinson and Bellett, 
1975a ; Keegstra et al., 1977). By gel electrophoresis of labeled DNA-free 
terminal protein (TP), it could be established that TP has an apparent 
molecular weight of 55K (Rekosh et al., 1977). 

Due to the hydrophobic character of TP, DNA-protein complexes 
aggregate very easily. As a result of this aggregation, DNA-protein com- 
plexes accumulate on tops of agarose and polyacrylamide gels during elec- 
trophoresis. It has been observed that when DNA-protein complexes are 
digested with restriction endonucleases and the digestion products are 
separated by gel electrophoresis, the terminal fragments carrying TP pref- 
erentially stay on top of the gel, while internal fragments conventionally 
run into the gel (Brown et al., 1975; Sharp et al., 1976). Another way to 
separate the DNA-protein complexes from protein-free DNA is based on 
differential binding of these compounds to glass-fiber filters (Coombs and 
Pearson, 1978; Coombs et al., 1978). 

To establish the nature of the DNA-protein linkage, deproteinized 
DNA and DNA-protein complexes have been subjected to enzymatic and 
nonenzymatic treatments. Both types of DNA are inaccessible to phos- 
phatase, DNA polynucleotide kinase, and \-exonuclease VII (Carusi, 
1977; Sharp et al., 1976), indicating that the 5' ends of adenovirus DNA 
are blocked. On the other hand, the 3' ends can freely be labeled with 
terminal transferase and are accessible to exonuclease HI. These results 
are most easily explained assuming that in the DNA-protein complex, 



THE STRUCTURE OF THE GENOME 



TP is covalently attached to the 5' ends of the two complementary 
strands. The inaccessibility of deproteinized DNA is probably due to the 
fact that the 5' ends are still linked to short peptides. Treatment of DNA- 
protein complexes or deproteinized DNA with alkali or piperidine re- 
moves these peptides and makes the DNA freely accessible for enzymes 
(Robinson et al., 1973; Carasi, 1977; Tolun et al., 1979; Rekosh, 1981) 
TP can also be separated from adenovirus DNA by digestion with nu- 
clease SI (Ariga et al., 1979, Roninson and Padmanabhan, 1980; Rijnders 
et al., 1983). The DNA-protein complex is cleaved in close proximity to 
the protein-DNA linkage and yields a protein with a molecular weight 
of 55K (Rijnders et al., 1983). Recently, Rekosh (1981) showed that treat- 
ment of the Ad2 DNA-protein complex with piperidine releases a protein 
with a molecular weight of 52K. This observation suggests that after 
DNase I or SI digestion, the TP isolated still contains a few nucleotide 
residues. 

The nature of the linkage between TP and the DNA molecule has 
been elucidated by Desiderio and Kelly (1981). Their experiments clearly 
indicate that Ad2 TP is bound to DNA by a phosphodiester bond between 
the hydroxyl group of a Ser residue of TP and the 5 '-phosphate group of 
the terminal deoxycytidine residue of the two complementary strands of 
adenovirus DNA. The particular Ser residue in the TP amino acid se- 
quence involved in the linkage of TP to DNA has recently been identified 
(Smart and Stillman, 1982). 

The origin of TP has been uncertain for many years. Green et al. 
(1979c) showed by tryptic fingerprinting of TPs of five different human 
serotypes that these proteins were very similar in structure. On the other 
hand, Rekosh (1981) found different sizes for the TPs of different human 
serotypes, suggesting that TP is not of cellular origin. He concluded that 
TP is a highly conserved virus-coded protein. The viral origin of TP was 
unambiguously proved by Stillman et al. (1981), who showed that cell- 
free translation of ruRNAs selected from a region between coordinates 
11 and 31.5 on the viral 1-strand (see Section IV) leads to synthesis of 
proteins with apparent molecular weights of 105, 87, and 75K. The 87K 
protein appeared to be identical to an 80K protein (Challberg et al., 1980) 
that is covalently attached to the 5' ends of growing Ad2 DNA strands 
synthesized in an in vitro DNA replication system (Challberg and Kelly, 
1979a,b). The 80K protein is structurally related to TP, suggesting that 
TP is synthesized as an 80K precursor TP (pTP) and that pTP is the active 
form of TP in adenovirus DNA replication. The different molecular 
weights found for pTP (80 and 87K) are due to the use of different mo- 
lecular-weight markers. The 80/87K protein appears to be identical to 
the protein that is covalently attached to the DNA from temperature- 
sensitive (ts) mutant Ad2tsl virions grown at the nonpermissive tem- 
perature (Stillman et al., 1981; Challberg and Kelly, 1981). Ad2tsl is a 
mutant that cannot cleave virus-coded precursor proteins to their mature 
counterparts during virion maturation (Begin and Weber, 1975; Weber et 
al., 1975). 
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The mapping of pTP on the virus genome led to the definition of a 
new early transcription unit, designated E2b. The structure of this region 
is discussed in detail in Section VII.B.3. 

Evidence has been presented that TP plays an essential role in the 
initiation of adenovirus DNA replication. Analysis of the in vitw DNA 
replication system developed by Challberg and Kelly [1979a,b) ; in which 
the DNA-TP complex is used as a template, showed that the first step 
in the replication of adenovirus DNA is the linkage of dCMP to pTP. The 
protein probably recognizes a specific sequence within the inverted ter- 
minal repetition, which might be involved in binding of pTP to the DNA 
(Tamanoi and Stillman, 1982). It is likely that the conserved sequence 
9-22 in different adenovirus, serotypes functions as such a recognition 
sequence. The presence of TP in the DNA-TP complex might stabilize 
the initiation complex. Recently, it was shown that the protein is dis- 
pensable (Tamanoi and Stillman, 1982), since adenovirus DNA devoid of 
TP or remaining amino acids can also be used as template in an in vitro 
DNA replication system. It has been proposed that the presence of TP in 
the DNA-TP complex protects the viral DNA against nucleolytic deg- 
radation. 

A protecting function of TP has also been proposed to explain the 
high infectivity of DNA-protein complexes. Deproteinized DNA is in- 
fectious when assayed by the calcium coprecipitation procedure (Ni- 
colson and McAllister, 1972; Graham and van der Eb, 1973). However, 
the infectivity of DNA-TP complexes is 50-100 times higher (Sharp et 
al., 1976 } Chinnadurai et al., 1978; van Wielink, 1978). Although the 
difference in infectivity might be due to a protective function of TP, it 
cannot be excluded that the presence of TP on the template is essential 
for accurate positioning of the pTP on the DNA during the first stage of 
initiation of adenovirus DNA replication. The role of TP in DNA rep- 
lication is discussed extensively in Chapter 7. 

VH. ORGANIZATION OF THE ADENOVIRUS GENOME 

For the unraveling of the organization of the adenovirus genome, a 
great variety of techniques have been employed, i.e., DNA-RNA hy- 
bridization, R-loop mapping, genetic mapping of mutants, translation of 
preselected mRNA species, and nucleotide sequence analysis (for details, 
see Mautner et al., 1975; Sambrook et al., 1975; Grodzicker et al., 1975, 
1977; Chow et al., 1977b, 1979a,b ; Berk and Sharp, 1977a, 1978; Westphal 
et al., 1976; Westphal and Lai, 1977; Kitchingman et al., 1977-, Kitch- 
ingman and Westphal, 1980; Miller et al., 1980) (for sequences, see Ap- 
pendix B). Despite a substantial nucleotide sequence divergence, all ad- 
enovirus serotypes studied so far show the general genetic organization 
(see Appendix B). Since the genomes of the highly homologous types Ad2 
and Ad5 have been investigated most extensively, the organization of the 
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adenovirus genome is discussed employing for the most part data obtained 
with these particular serotypes. The precise location of major landmarks 
at the nucleotide level is indicated in the Ad2 sequence (Appendix B), 
unless otherwise stated. During the productive infection cycle of ade- 
noviruses, the different viral genes are expressed in a rather complex 
pattern (Tooze, 1981; Persson and Philipson, 1982). 

Traditionally, the adenovirus genes are subdivided into early genes, 
which are expressed before the onset of viral DNA replication, and late 
genes, which are transcribed after replication of adenovirus DNA has 
started. However, a group of intermediate genes has also been distin- 
guished. These genes are expressed at intermediate times in infection in 
the absence of DNA synthesis and are also easily detected at late times. 
The complex transcription pattern of adenovirus DNA is discussed ex- 
tensively in Chapter 5. A summary of the major RNA transcripts and the 
corresponding proteins is presented in Figs. 1 and 2. These diagrams dem- 
onstrate that the adenovirus genetic information is scattered over the 




FIGURE 1. Transcriptional organization of the Ad2 genome. The genome is divided into 
100 map units. The r-strand is rightward-transcribed into RNA and the 1-strand leftward. 
The direction of transcription is indicated by arrows. The capped 5' ends of the cytoplasmic 
RNA indicate the positions of transcriptional promoters, while the arrowheads represent 
the 3' polyadenylation sites. Gaps in arrows indicate intervening sequences, which have 
been removed from the cytoplasmic RNA by splicing. The RNA shown in bold lines can 
be- detected early in infection before the onset of DNA replication (regions Ela, Elb, E2a, 
E3, E4 ; also the late promoter at 16.5 units is active early in infection, leading to transcription 
to 39 units). The light lines represent intermediate RNAs synthesized at early as well as at 
late times in the infection cycle (E2a, E2b, polypeptide DC). The double-lined arrows indicate 
late RNA species. Correlations of mRNAs with encoded proteins are based on cell-free 
translation of selected RNA species and RNA mapping data. Proteins are designated by their 
molecular weights in kilodaltons (K) or by roman numerals (virion components). 
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FIGURE 2. Protein-coding regions of the Ad2 genome. The regions on the adenovirus gen- 
ome that code for protein have been determined by hybrid-arrest translation, by in vitro 
translation of preselected mRNAs, by RNA mapping, and by direct DNA and RNA sequence 
analysis. The identified proteins are designated by their apparent or theoretical molecular 
weights in kilodaltons or by roman numerals (virion components]. Regions pVI, pVII, and 
pVUI indicate the positions of the precursors of polypeptides VI, VII, and VHI. Interrupted 
coding regions indicate discontinuous genes. 

two complementary strands. About 69% of all genetic information is 
located on the rightward- transcribed strand (r-strand), while only 31% of 
the coding sequences are present on the leftward-transcribed strand (1- 
strand). 

The positions of promoters and starts of transcription have been 
mapped via a variety of methods (Berk and Sharp, 1977b ; Pettersson and 
Mathews, 1977; Spector et al., 1978; Seghal et al., 1979; Wilson et al., 
1979; Chow et al., 1979a,b ; Shaw and Ziff, 1980; Akusjarvi and Persson, 
1981a ; Stillman et al., 1981). Many of the positions of promoters have 
been correlated with sequences generally indicated as TATA or Gold- 
berg-Hogness boxes. These AT-rich sequences are considered to repre- 
sent a constitutive part of promoter signals (see Chapter 5). The genes 
expressed early in infection are transcribed from six different promoters 
(r-strand: positions 1.3, 4.6, 16.5, and 76.6; 1-strand: 75.1 and 99.1). The 
intermediate genes are transcribed from promoters located at positions 
9.7 on the r-strand and 16.1 and 75.1 on the viral 1-strand. The long late 
transcription unit uses the major late promoter at map position 16.5 on 
the viral r-strand. All primary transcription products of adenovirus DNA 
are processed in the nucleus before entering the cytoplasm. They are 
capped with 7me G5'pppN at the 5' end, and they are polyadenylated at 
the 3' end. With one exception (polypeptide IX mRNA), all primary tran- 
scription products are processed into families of related mRNAs that 
share common 5' and 3' ends, but differ by alternative splicing (early 
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regions Ela, Elb, E2a, E3, and E4, intermediate regions E2b and rVa 2 , and 
late regions LI, L2, L3, L4, and L5). It should be noted that in fact, analysis 
of the late transcription unit of adenovirus led to the original discovery 
of the phenomenon of RNA splicing. A detailed analysis of the transcrip- 
tion of the adenovirus genome is presented in Chapter 5. The organization 
of the transcriptional units of the adenovirus genome will now be de- 
scribed systematically from left to right. Since the organization of the 
Ad2 and Ad5 genomes has been investigated most extensively, these ge- 
nomes are used for illustration. 

The positions of major landmarks of the transcription units are in- 
dicated in Figs. 3-6 and Appendix B in the r- and 1-strand sequences. It 
should be borne in mind that sequences of the r-strand of DNA are equiv- 
alent to RNA transcribed from the 1-strand and that sequences of the 1- 
strand of the genome are equivalent to mRNA transcribed from the r- 
strand. Unfortunately, the entire nucleotide sequences of Ad2 and Ad5 
are not yet available, only a number of noncontiguous regions having 
been sequenced. Therefore, the numbering of the base pairs in Fig. 3-6 
and Appendix B has not been added, but the sequence of each specific 
region starts from the left with base pair number 1. 



A. Early Region El (1.3-11.2) 

Early region El is transcribed from the leftmost part of the viral r- 
strand. It contains genes involved in cell transformation (Graham et al., 
1974a,b ; van der Eb et al., 1979) and regulation of transcription (Berk et 
al., 1979; Jones and Shenk, 1979a ; Nevins, 1981). The complete nucleo- 
tide sequence of this region has been established for human serotypes 
Ad2, Ad5, Ad7, and Adl2 (van Ormondt et al., 1978, 1980a,b ; Sugisaka 
et al., 1980; Dijkema et al., 1980a,b, 1981; Bos et al., 1981; Kimura et 
al., 1981; Gingeras et al., 1982). The overall organization of this region 
appears to be very similar for the different serotypes (van Ormondt et al., 
1980b; Dijkema et al., 1982). The region between 1.3 and 11.2 m.u. can 
be subdivided into three transcription units designated Ela, Elb, and re- 
gion LX (Kitchingman et al., 1977; Berk and Sharp, 1977a, 1978; Chow 
et al., 1979a,b). The mRNAs derived from region El have been charac- 
terized by EM mapping, in vitro translation, and sequence analysis. It 
appears that all mRNAs except protein IX mRNA have a spliced structure 
and code for a variety of proteins, some of which are structurally related. 

1. Early Region Ela (1.3-4.6) 

Early region Ela is transcribed from the r-strand between 1.3 and 4.6 
m.u. and codes for proteins that are involved in initiation of transfor- 
mation (van der Eb et al., 1979) and regulation of early gene expression 
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FIGURE 3A-C. Structural organization of the region between coordinates 0.0 and 31.7 on 
the Ad2 genome. The analysis of the structural organization is based on the nucleotide 
sequence shown in Fig. 18 (Appendix B), and indicated positions refer to this sequence. The 
1-strand of the DNA is homologous to r-strand transcripts, while the r-strand is homologous 
to 1-strand transcripts. Here and in Figs. 4-6 and Appendix B: Termination codons (TAA, 
TGA, and TAG] are indicated in the three frames of the 1- and r-strands by short vertical 
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FIGURE 4. Structural organization of the region between coordinates 49.0 and 51.8 on the 
Ad2 genome. This analysis is based on the nucleotide sequence shown in Fig. 19 (Appendix 
B). This region mainly codes for the precursor of polypeptide VI. For explanation of the 
symbols, see the Fig. 3 caption. 

(Jones and Shenk, 1979a.; Berk et ol., 1979) (see Fig. 3). The promoter of 
this region has been mapped at position 1.3 (Wilson et al., 1979). Analysis 
of the Ad2 sequence reveals that at position 468 [see Fig. 18 (Appendix 
B)], the TATA box TATTTATA is present. Baker and Ziff (1980, 1981) 
have characterized the position where transcription of the Ela RNA is 
initiated: They found that all mRNAs start with a capped dAMP residue 



lines, while the initiation codon ATG is indicated by the symbol 9 • The coding regions 
that have been correlated with known proteins are shown by bold lines and are designated 
by molecular weights of the corresponding proteins or by roman numerals. Unidentified 
reading frames [(URF) initiating with ATG and terminating with one of the termination 
codons] or open reading frames [(ORF) regions between two termination codonsj longer than 
300 nucleotides are also indicated. Between the scales for Map units and Base pairs, the 
positions of TATA boxes, polyadenylation signals, and leader sequences are indicated. At 
some positions along the genome, splicing may occur. These positions are indicated by 
interrupted lines. 
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FIGURE 5. Structural organization of the region between coordinates 59.9 and 71.4 on the 
Ad5 genome. This analysis is based on the nucleotide sequence shown in Fig. 24 (Appendix 
B). This region codes for a 23K protein, DNA-binding protein (DBP|, and a part of the 100K 
protein. For explanation of the symbols, see the Fig. 3 caption. 

derived from position 499. Three mRNA species have been identified 
from region Ela with sedimentation coefficients of 13, 12, and 9 S. These 
mRNAs share the same 5' and 3' termini and differ only in the size of 
the RNA fragment removed by splicing during the processing of nuclear 
RNA (Kitchingman et al., 1977; Berk and Sharp, 1977a, 1978; Chow et 
al., 1979a,b ; Perricaudet et al., 1979). The splice points of the 13 S RNA 
have been mapped at nucleotide positions 1112 and 1227 and of the 12 
S mRNA at positions 974 and 1227 (Perricaudet et al., 1979). The donor 
splice site of the 9 S mRNA species has not been determined yet. The 3' 
ends of the mRNAs are located at nucleotide position 1630, while the 
polyadenylation signal AATAAA is found at position 1609 (Perricaudet 
et al., 1979; Fraser et al., 1982). 

Since the reading frames in the Ela mRNAs are the same, the proteins 
derived from these mRNAs share their N-terminal and C : terminal seg- 
ments and differ only in the number of mtervening amino acids. From 
the DNA sequence, the complete amino acid sequences of the proteins 
specified by the 13 and 12 S mRNA species can be predicted. Both proteins 
must be rich in Pro and Glu residues and have theoretical molecular 
weights of 32 and 26K, respectively. The protein derived from the 9 S 
mRNA has an estimated molecular weight of 13K. These proteins have 
been correlated with proteins produced during cell-free translation of iso- 
lated mRNAs (Lewis et al., 1976; Pettersson and Mathews, 1977; Harter 
and Lewis, 1978; Green et al., 1979a ; Esche et al., 1980; Spector et al., 
1980a,b; van der Eb et al., 1979; Lupker et al., 1980). These translation 
products with apparent molecular weights of 48-58, 42-54, and 28K are 
structurally related, which is in agreement with the nucleotide sequence 
of this region. The discrepancy between the theoretical and apparent mo- 
lecular weights probably reflects the extremely high Pro contents of these 
proteins, which lead to aberrant migration in gels. 
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FIGURE 6A-C. Structural organization of the regions between coordinates 71.2 and 100.0 
on the Ad2 genome. This analysis is based on the nucleotide sequence shown in Fig. 21 
(Appendix B). For explanation of the symbols, see the Fig. 3 -caption. 
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As mentioned before, the Ela regions of Ad2, Ad5, Ad7, and Adl2 
show very similar organization. In all serotypes, three spliced mRNA 
species are synthesized. Recently, it was shown that the protein encoded 
by the 13 S mRNA governs early gene expression (Montell et al., 1982). 

2. Early Region Elb (4.6-11.2) 

Early region Elb is transcribed from the viral r-strand between map 
coordinates 4.6 and 11.2 [see Figs. 3 and 18 (Appendix B)]. The proteins 
encoded by this region are involved in transformation and play an im- 
portant role in oncogenesis,- during lytic infection, these proteins are in- 
volved in DNA replication (Harrison et al., 1977; Frost and Williams, 
1978; Jones and Shenk, 1979a,b ; van der Eb et al., 1979; Bernards et al., 
1982; van den Elsen et al., 1982). Little is known about the precise role 
of these proteins. Studies of cells transformed by DNA fragments of dif- 
ferent length have suggested that region Ela is able to immortalize cells, 
while region Elb is required for full expression of the typical phenotype 
of adenovirus-transformed cells (van der Eb et al., 1979; Houweling et 
al., 1980). 

The promoter of early region Elb is located at map position 4.6, 
where, at nucleotide 1670, a Goldberg-Hogness box TATATAA is found 
(Fig. 18). Transcription may start at position 1700 or 1702 (Baker and Ziff, 
1981) and proceeds until nucleotide 4061 (Perricaudet et al., 1980; Fraser 
et al., 1982). The polyadenylation signal of region Elb is located at nu- 
cleotide 4030. The primary transcription product of region Elb is proc- 
essed by splicing into a 22 and a 13 S mRNA species. Both species share 
a 3 '-terminal segment from nucleotide 3590 to a polyadenylation site at 
nucleotide 4061. Both species also contain a 5 '-terminal sequence from 
1700 or 1702 to a donor splice site at nucleotide 2250. In the 13 S mRNA, 
nucleotide 2250 is joined to an acceptor splice site at 3590, whereas the 
22 S mRNA includes nucleotide 2250 to a second donor splice site at 
nucleotide 3505. Nucleotide 3505 of the 22 S mRNA is ligated to the 
common acceptor splice site at nucleotide 3590. From these points, the 
mRNA sequence continues to the polyadenylation site near nucleotide 
4061 (Perricaudet et al., 1980; Alestrom et al., 1980). In vitro translation 
experiments have shown that two major proteins with molecular weights 
of 55-65 and 15-19K can be assigned to this transcription unit (Lewis et 
al., 1976; Harter and Lewis, 1978; van der Eb et al., 1979; Brackmann et 
al., 1980). This observation is in agreement with the fact that the two 
mRNA species contain information for two major tumor (T) antigens 
with theoretical molecular weights of 21 and 55K, which are encoded by 
two overlapping reading frames. The 22 S mRNA codes for both proteins 
depending on which particular ATG triplet serves as the start codon. The 
21K protein' initiates at the 5'-proximal ATG (position 1712), while the 
55K protein initiates at the second ATG (nucleotide 2017) in another 
reading frame (Anderson and Lewis, 1980; Bos et al., 1981). In addition, 
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the 2 IK protein can also be synthesized from the 13 S mRNA. Peptide 
mapping has shown that the small-t and the large-T antigens do not share 
tryptic peptides, in accordance with the nucleic acid sequence data (Bos 
et al., 1981). 

Similar organization of region Elb has been found for Ad2, Ad7, and 
Adl2 (Bos et al., 1981; Kimura et al., 1981; Dijkema et al., 1982; Gingeras 
et al., 1982). This does not exclude small differences between mRNAs 
from different serotypes. Comparison of the Elb mRNAs of Ad5 and Adl2 
has revealed that the Ad 12 mRNA contains additional splices in the 3' 
noncoding part of the mRNA (Virtanen et al., 1982a). The precise func- 
tions of the 21K and 55K proteins are still unknown. 

The 22 and 13 S mRNAs both contain information for protein EX, a 
protein that has been mapped between 9.7 and 11.2 map units (Chow et 
al., 1977b; Pettersson and Mathews, 1977; Esche et al., 1980). However, 
this information is not translated from these messengers. Instead, a 
onique short mRNA is synthesized from an independent transcription 
unit between coordinates 9.7 and 11.2 (Wilson et al., 1979; Chow et al., 
1977a,b; Pettersson and Mathews, 1977). The sequences of the genes that 
encode the Ad2 and Ad5 polypeptides LX have been established, which 
allowed the identification of transcription and translation signals (Maat 
et al., 1980; Alestrom et al., 1980). The polypeptide LX TATA box is 
located at position 3546, and transcription starts at nucleotide position 
3575 or 3577 (map position 9.7) in the Ad2 sequence [Fig. 18 (Appendix 
B)]. Its 3' end has been located at nucleotide position 4061 (map position 
11.2) (Alestrom et al., 1980; Fraseretal, 1982), while the polyadenylation 
signal AATAAA is located at position 4030. The same polyadenylation 
signal is also used for processing of the large and the small Elb T antigen 
mRNAs. The RNA synthesized is not processed and represents the only 
known unspliced adenovirus mRNA. The mRNA contains a continuous 
open reading frame that codes for a protein of 14K. Protein LX (apparent 
molecular weight 12.5K) is found in virions and was therefore originally 
classified as a late protein (Pettersson and Mathews, 1977).. Later exper- 
iments showed that protein LX is also synthesized in the absence of viral 
DNA replication, indicating that it is an intermediate protein (Persson 
et al., 1978). The complete nucleotide sequence of the polypeptide LX 
gene has been determined for human serotypes Ad2, Ad3, Ad5, Ad7, and 
Adl2 (Maat et al., 1980; Alestrom et al., 1980; Dijkema et al., 1981; 
Kimura et al., 1981; Engler, 1981). Within the same group, the protein 
LX genes exhibit a striking similarity, but the genes of serotypes from 
different groups are much less homologous. 

3. Unidentified Reading Frames 

In the 1-strand transcripts, a number of unidentified reading frames 
(URFs) have been detected. The URFs larger than 300 nucleotides are 
indicated in Figs. 3 and 18 (Appendix B). However, recently it could be 



56 



JOHN S. SUSSENBACH 



shown that in transformed cells and infected cells, an 1-strand transcript 
is synthesized that spans the Ela-Elb junction and codes for a protein 
with a molecular weight of 11K (Katze et al., personal communication). 
This transcript might very well be derived from URF 1 1 located between 
nucleotides 1713 and 1197 on the viral 1-strand. At position 443, the 
sequence AATAAA is found, which might function as a polyadenylation 
signal. This indicates that it is certainly not impossible that later some 
of these will appear to be expressed during the infection cycle, albeit at 
a very low frequency. 

B. Late and Intermediate Genes in the Region between 
Coordinates 11.2 and 31 

1. Major Late Promoter and Tripartite Leader 

The region between 11.2 and 31 contains a mosaic of different stra- 
tegic regions in both complementary strands [see (Figs. 3 and 18 (Appendix 
B)]. The major late promoter has been mapped on the r-strand at position 
16.5 (Evans et al., 1977; Ziff and Evans, 1978). This promoter is also active 
early in infection (Shaw and Ziff, 1980; Akusjarvi and Persson, 1981b). 
In the nucleotide sequence at this position, there is a TATA box TATAAA 
at nucleotide position 6006, and transcription starts from position 6037 
(Baker and Ziff, 1981). During early times in infection, transcription pro- 
ceeds no further than map position 39, while at late times, transcription 
proceeds to map position 99.0 (Eraser et al., 1979). Messenger RNAs de- 
rived from r-strand transcripts starting at position 16.5 contain a common 
tripartite leader (Berget et al., 1977, 1978; Chow et al., 1977a,b ; Akusjarvi 
and Pettersson, 1979a,b ; Zain et al., 1979a,b ; Ziff and Evans, 1978). The 
sequence of the tripartite leader of late Ad2 RNA has been determined 
by sequencing complementary DNA (cDNA) transcribed from hexon 
mRNA and a cDNA clone of fiber mRNA (Zain et al, 1979a ; Akusjarvi 
and Pettersson, 1979b). The tripartite leader sequences have been estab- 
lished for a number of serotypes [Ad2 (Ziff and Evans, 1978; Akusjarvi 
and Pettersson, 1979a ; Zain et al., 1979a), Ad5 (van Beveren et al., 1981), 
Ad3 and Ad7 (Engler et al., 1981)]. 

The overall length of the Ad2 tripartite leader is 203 nucleotides, 
comprising 41 nucleotides from the promoter region at map position 16.5, 
72 nucleotides from position 19.6, and 90 nucleotides from position 26.5 
on the genome. Examination of the sequence reveals that the tripartite 
leader does not contain an AUG triplet, suggesting that translation of 
late adenoviral mRNA does not initiate within the tripartite leader. In 
some intermediate and late transcripts, an additional leader fragment (i- 
leader) has been detected by R-loop mapping, which maps at coordinates 
21.5-23.0 (Chow et al., 1979a). Sequence analysis has shown that in con- 
trast to the tripartite leader, the i-leader (nucleotides 7940-8379) contains 
an open reading frame for a hypothetical protein of 15.9 kilodaltons (kd). 
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In vitro translation of mRNA selected on DNA fragments that contain 
i-leader sequences does indeed lead to synthesis of a hitherto unknown 
protein (URF2) with an apparent molecular weight of 13.6-16K (Lewis 
et al., 1979; Lewis and Mathews, 1980; Virtanen et al., 1982b). The ter- 
mination codon for the 15.9-kd protein is not present in the i-leader ; but 
is probably located within the third leader. The function of the 15.9-kd 
protein is still unknown. 

2. Virus-Associated RNAs 

At positions 28.8 and 29.5 on the genome, the genetic information 
for two low-molecular-weight RNAs is located, these RNAs being des- 
ignated virus-associated (VA) RNAs VA-RNAI and VA-RNAH (Soderland 
et al., 1976; Mathews and Pettersson, 1978) (Fig. 3). In contrast to all 
other genes, the VA genes are transcribed by RNA polymerase III instead 
of RNA polymerase II (Price and Penman, 1972; Weinman et al., 1974, 
1976; Soderland et al., 1976). The VA-RNAs are probably synthesized 
from two separate promoter sites in the r-strand and do not undergo post- 
transcriptional processing. The genes and the RNA products have been 
subjected to nucleotide sequence analysis (Ohe and Weissman, 1970, 
1971; Ohe, 1972; Pan et al.,. 1977; Celma et al., 1977a,b ; Akusjarvi et al., 
1980). The nucleotide sequence of VA-RNAI was determined by Ohe and 
Weissman (1971) to be 157-160 nucleotides long (nucleotides 10,608- 
10,764/10,767). Vennstrdm et al. (1978a,b) demonstrated that the 5' end 
of VA-RNAI is heterogeneous and may start at nucleotide 10,605 or 
10,608 [Fig. 18 (Appendix B)]. The length of VA-RNAH is 158-163 nu- 
cleotides (nucleotides 10,864-11,021/11,026), and the two VA-RNAs are 
separated by a spacer about 98 nucleotides long. The function of these 
RNAs is still unknown; so far, no proteins derived from them have been 
found. It has been suggested that these RNAs play a role in splicing or 
stabilization of late mRNA (Murray and Holliday, 1979; Mathews, 1980). 
It is interesting to note that the VA-RNAs can form almost identical 
secondary structures with high stability. The structures show similarities 
to transfer RNA (Zain et al., 1979b; Akusjarvi et al., 1980). 

3. Early Region E2b and Protein IVa 2 (11.2-30.2) 

For a long time, it has been thought that the 1-strand transcripts 
between map units 1 1 and 30 coded only for the intermediate protein 
rVa 2 (molecular weight 50K), a protein that is involved in the morpho- 
genesis of virions (Persson et al., 1979a). The gene of this protein has 
been mapped between coordinates 1 1.3 and 16.1 (Lewis et al., 1975, 1977) 
[see Figs. 3 and 18 (Appendix B)]. Transcription of the Wa 2 gene starts 
from a promoter located at map position 16.1. Nucleotide sequences of 
this region reveal that although no regular TATA box is located in this 
region, the sequence TCCTT, which may resemble a TATA box, is pres- 
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ent at nucleotide 5859. RNA synthesis starts at position 5826 or 5824 
and proceeds to nucleotide 4051 (Alestrom et al., 1980; Baker and Ziff, 
1981; Fraser et al., 1982) (Fig. 18). The messengers from this region con- 
tain an intron located between nucleotides 5419 and 5693 (Chow et al., 
1977a,b; Broker et al., 1977 ■, Kilpatrick et al., 1979; van Beveren et al., 
1981). The mRNA contains a long open reading frame (ORF) correspond- 
ing to 445 amino acids of which the first 4 N-terminal amino acids are 
coded by RNA upstream from the donor splice site and the remaining 
amino acid residues by RNA downstream from the acceptor splice site. 
It is noteworthy that the reading frame in which these 4 N-terminal 
amino acids He is part of a much longer reading frame that codes for a 
protein of 120 kd (see below). Another interesting feature of the rVa 2 gene 
is that the 3' end of the message overlaps the end of the Elb and poly- 
peptide IX mRNAs with 9 nucleotides. Also, the IVa 2 termination codon 
TAA (nucleotide 4084) forms a part of the IVa 2 polyadenylation signal 
AATAAA (nucleotide 4086). The IVa 2 genes of serotypes Ad2, Ad5, and 
Ad7 have all been sequenced and show the same structural organization 
(van Beveren et al., 1981; Engler and van Bree, 1982; Gingeras et al., 1982; 
Alestrom et al., 1982b). The IVa 2 nucleotide sequences of Ad7 and Ad5 
are 78% homologous. 

A new class of mRNAs from the region between 1 1 and 30 m.u. was 
identified by Stillman et al. (1981). The promoter of these transcripts has 
been mapped at position 75.1 and is probably identical to the promoter 
of early region E2a. Transcripts of this region, which is designated E2b, 
contain, in addition to the 75.1-m.u. leader, additional leaders from 68.5 
and 39 m.u. Region E2b has been classified as an intermediate transcrip- 
tion unit (Fig. 3). The main bodies of messages derived from this tran- 
scription unit may start at positions 30, 26, and 23, respectively, and 
continue to position 11.2. In vitio translation of preselected mRNAs 
derived from the region between 11.2 and 3 1 .5 led to synthesis of proteins 
with molecular weights of 105, 87, and 75K (Stillman et al., 1981; Binger 
et al., 1982). The 87K protein is identical to the precursor terminal protein 
(pTP) with a molecular weight of 80K described by Challberg et al. (1980) 
(see Section VI). Nucleotide sequence analysis of this region has indicated 
the presence of two long ORFs located between 28.9 and 23.5 m.u. and 
24.1 and 14.2 m.u. [Fig. 18 (Appendix B)]. The region between 28.9 and 
23.5 m.u. beginning at nucleotide 10,577 has the first ATG at nucleotide 
10,532 and continues to a terminator at nucleotide 8573. This frame codes 
for a protein with a minimum molecular weight of 74. 5K. The second 
large ORF begins at nucleotide 8793, has the first ATG at 8355, and 
continues to a terminator TAG at nucleotide 5190. The total coding ca- 
pacity of this reading frame is 132. lkd, while the capacity from the first 
ATG to the terminator is 120. 4kd (Gingeras et al., 1982; Alestrom et al., 
1982; Engler et al., 1983). Since the precise structure of the spliced E2b 
mRNAs is still unknown, it cannot be excluded that a part of the leader 
from map position 39 is part of the coding sequences of E2b mRNAs. EM 
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mapping of E2b mRNAs has indicated that the 3' ends of the messengers 
map at position 11.2, the same position where the 3' end of IVa 2 mRNA 
is located. It is therefore likely that the mRNAs of pTP and the 120kd 
polypeptide have the same 3' end and polyadenylation site as the IVa 2 
mRNA (Alestrom et al., 1980; Stillman et al., 1981). Smart and Stillman 
(1982) showed by analysis of tryptic peptides from the terminal protein 
and its precursor that the ORF between 28.9 and 23.5 codes for pTP. Very 
recently, the ORF from 24.1 to 14.2 was assigned to an adenovirus-specific 
DNA polymerase (Kelly, Stillman, and Hurwitz, personal communica- 
tions). This polymerase has an apparent molecular weight of 140K, co- 
purifies with pTP, and is able to complement a defective in vitro DNA 
replication system of the DNA-synthesis-negative temperature-sensitive 
(ts) mutant Ad5ts36 (Enomoto et al., 1981; Lichy et al., 1982; Kelly and 
Stillman, personal communications). The mutant Ad5ts36 has been 
mapped between 18.5 and 22.0 m.u. (Galos et al., 1979). In addition to 
these two proteins, all E2b messengers contain genetic information for 
the rVa 2 protein, but this information is probably not translated from the 
E2b messengers. 

4. Unidentified Reading Frames 

Several unidentified shorter reading frames are present in this region 
of the viral genome (Fig. 3). However, no correlation with known proteins 
or gene functions has been discovered yet. In this respect, it should be 
noted that translation in vitro of early mRNA selected by hybridization 
to fragments of DNA derived from this region has identified mRNA spe- 
cies that encode additional proteins (Lewis and Mathews, 1980). A DNA 
fragment from 17.0 to 21.5 m.u. selects an mRNA that is complementary 
to the r-strand and codes for a 13.5-kd protein (Lewis et al., 1979; Lewis 
and Mathews, 1980). Further, two polypeptides of 16.5 and 17.0kd have 
been described, translated from mRNAs that are selected by DNA frag- 
ments lying between 11.6 and 17.0 m.u. (Lewis et al., 1979). 



C. Late Regions LI, L2, and L3 (31.0-61.7) 

A major event in the infection cycle of adenoviruses is the activation 
of the entire late transcription unit. As mentioned in Section VH.B.l, the 
promoter of the late transcription unit is located at map position 16.5, 
and this promoter is already active early in infection. However, during 
the early phase, transcription does not proceed further than map position 
39 (Shaw and Ziff, 1980; Akusjarvi and Persson, 1981b). In the late phase, 
transcription continues to map position 99.0 (Fraser et al., 1979, 1982). 
The transcription product ranging from map positions 16.5 to 99.0 is 
considerably processed, leading to the production of five families of late 
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mRNAs (L1-L5) (Chow et al., 1977b, McGrogan and Raskas, 1978; Chow 
and Broker, 1978; Nevins and Darnell, 1978). Each of the five classes 
expresses more than one protein and contains mRNAs with a common 
3' end jZiff and Fraser, 1978; Nevins and Darnell, 1978; Fraser and Ziff, 
1978). At the 5' end, all these mRNAs contain the tripartite leader. 

The region on the Ad2 genome between 30.2 and 61.7 m.u. contains 
the genes for the families L1-L3. As mentioned above, the LI family of 
RNAs is already expressed early in infection. This family consists of three 
mRNAs that have a common 3' end mapping at 39 m.u. At the same 
position, the polyadenylation site of the LI family has been mapped 
(Fraser et al., 1979, 1982). The LI mRNAs code for two structurally related 
proteins of 52 and 55K (Lewis and Mathews, 1980; Miller et al., 1980) 
and polypeptide IHa (molecular weight 66K). Since nucleotide sequences 
from the left-hand end of Ad2 DNA have not been established further 
than position 31.5, only the initiation codon of the 52,55K protein has 
been identified unambiguously (Akusjarvi et al., 1980). The function of 
the 52,55K protein is still unknown. The LI family further contains ge- 
netic information for protein JJIa, which has been mapped by hybrid-arrest 
translation between 34.3 and 39.3 m.u. This protein has a molecular 
weight of 66K and is present in virions associated with the hexon poly- 
peptides. 

Located from positions 39 to 50 is the L2 family, consisting of three 
mRNA species that code for polypeptide III (molecular weight 85K), the 
precursor of polypeptide VII (20K), and polypeptide V (48.5K). These pro- 
teins are all constituents of adenovirus particles. One of these, the pre- 
cursor of polypeptide VII, is processed during maturation of virions to 
mature polypeptide VII (molecular weight 1 8.5K). This protein is identical 
to the major core protein. The genes for protein in, the precursor of protein 
VII, and protein V have been mapped by R-loop mapping and hybrid-arrest 
translation at 37.4-43.9, 43.9-45.4, and 45.3-49.6, respectively (Miller 
et al., 1980). 

Fraser et al. (1982) have mapped the polyadenylation site of the L2 
family at position 50. This fits well with the fact that in the nucleotide 
sequence from the region between coordinates 49.0 and 51.8 [Fig. 19 (Ap- 
pendix B)], the polyadenylation site of the L2 family has been identified 
at nucleotide 92, while an AATAAA signal is present at nucleotide 72 
(Akusjarvi and Persson, 1981a). 

The nucleotide sequence data from region 49.0-5 1.8 make it possible 
to pinpoint exactly some landmarks of the L3 family of late mRNAs (see 
Figs. 4 and 19). Three species of mRNAs have been identified that can 
be translated into the precursor of polypeptide VI (pVt), hexon (polypep- 
tide II), and a 23K protein. The gene for polypeptide pVI is located from 
49.1 to 51.2 and has been sequenced completely (Miller et al., 1980; Ak- 
usjarvi and Persson, 1981a). Also, the acceptor splice site at which the 
5' leader sequences are joined to the pVI message has been determined 
(nucleotide 123) (Fig. 19). This splice site is situated very close to the 
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start codon (nucleotide 124). The gene for polypeptide pVI codes for a 
protein with a theoretical molecular weight of 27K. This protein is 
cleaved during maturation of young virions, resulting in the formation 
of polypeptide VI (molecular weight 24K), which is part of the adenovi- 
rion. With the help of nucleotide sequence analysis, the N-terminal end 
of the hexon polypeptide has been mapped at coordinate 51.6, while the 
C terminus is located at 59.7 (Akusjarvi and Pettersson, 1978a,b). The 
hexon polypeptide is translated from start codon 961 of an mRNA that 
contains, in addition to the tripartite leader, a main body starting at nu- 
cleotide 925 in the sequence of Fig. 19 (Appendix B) to nucleotide 836 in 
the sequence of Fig. 20.1. The common polyadenylation site of the L3 
RNAs has been mapped at the same position. In accord with other po- 
lyadenylation sites, the sequence AATAAA is located close to this ad- 
dition site (nucleotide 812) (Fig. 20.1). The total nucleotide sequence of 
the hexon gene has not been established yet; only stretches of nucleotides 
have been determined (Jornvall et al., 1981b). However, by combination 
of nucleotide sequence and amino acid sequence data, the complete 
amino acid sequence of the Ad2 hexon polypeptide has been established 
(Jornvall et al., 1981a). It appears that the hexon polypeptide of Ad2 con- 
sists of 966 a mi no acid residues. It is the largest viral protein and has a 
calculated molecular weight of 108K and an apparent molecular weight 
of 120K. 

From positions 59.9 to 61.7, r-strand transcripts code for a protein of 
molecular weight 23K (Kruijer et al., 1980 ; Akusjarvi et al., 1981) [see 
Figs. 5 and 20.2 (Appendix B)]. A minor RNA species consisting of the 
tripartite leader and a main body corresponding to this region has been 
identified and translated. A protein with a molecular weight of 23K is 
synthesized from this messenger. Since the Ad2 mutant tsl has been 
mapped in the L3 region and is hampered in proteolytic cleavage of pre- 
cursors of polypeptides VI, VII, and VHI, it has been suggested that the 
23K protein is identical to a virus-coded protease (Bhatti and Weber, 1979). 

D. Early Region E2a (61.5-75.1) 

Early region E2a codes for the single-strand-specific, DNA-binding 
protein (DBP) (Figs. 5 and 6). This protein, discovered by. van der Vliet 
and Levine (1973), is phosphorylated, has an apparent molecular weight 
of 72K, and is involved in DNA replication, in regulation of early and 
late gene expression, and in cell transformation (Ginsberg et al., 1974; 
van der Vliet et al., 1975, 1977; van der Vliet and Sussenbach, 1975; Carter 
and Ginsberg, 1976; Horwitz, 1978; Mayer and Ginsberg, 1977; Carter 
and Blanton, 1978; Nevins and Jensen-Winkler, 1980; Klessig and Grod- 
zicker, 1979). The DBP genes of Ad2 and Ad5 have been analyzed in most 
detail. Therefore, the positions of strategic signals in the DBP gene are 
described in these sequences [Figs. 21 and 24 (Appendix B)]. It should be 
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polypeptide VTfl (molecular weight 26K) (75.5-77.3) (Figs. 5 and 6). The 
indicated map positions have been determined by hybrid-arrest transla- 
tion (Miller et al., 1980). Polypeptide VIII (molecular weight 13K) is pro- 
duced by proteolytic cleavage of its precursor during maturation of virions 
and is in virions associated with the hexon capsomers. The 100-kd protein 
is involved with folding of the hexon polypeptide chains into trimers 
(Ginsberg, personal communication), while the function of the 33-kd pro- 
tein is still unknown. The four mRNAs that code for these proteins form 
the L4 family of late mRNAs and share the 3'-terminal sequences. The 
common polyadenylation site has been mapped at 78 map units. 

Nucleotide sequences of this region have been determined in Ad2 
and Ad5 DNA (Galibert et al., 1979; Herisse et al., 1980; Kruijer et al., 
1981, 1982). Therefore, the strategic landmarks of the L4 proteins can be 
indicated at the nucleotide level. The acceptor splice point of the Ad5 
100-kd polypeptide has been determined by reverse transcription of 100- 
kd mRNA and is located at nucleotide 2316 [Fig. 24 (Appendix B)] (Kruijer 
et al., 1983). The polyadenylation site of the L4 mRNAs is mapped close 
to the sequence AATAAA at nucleotide 2572 [Fig. 21 (Appendix B|] (Fraser 
et al., 1982). Comparison of the Ad5 sequence, which extends to coor- 
dinate 71.4, with the sequence of Ad2 indicates that nucleotides 3855- 
4107 of the Ad5 sequence (Fig. 24) are colinear with nucleotides 1-253 
of the Ad2 sequence (Fig. 21). The frames in the overlapping sequences 
are identical and code, with a single exception, for identical amino acids. 
Using the combined sequences, it is possible to construct a hybrid 100- 
kd protein consisting of an ammo-terminal part from Ad5 and a carboxy- 
terminal part of Ad2. The hypothetical hybrid protein consists of 805 
amino acids and has an actual molecular weight of 89K. 

The coding sequences of the 100 and 33-kd proteins partially overlap. 
However, since these proteins do not share tryptic peptides (Gambke and 
Deppert, 1981), it is most likely that they are encoded by r-strand tran- 
scripts in different ORFs. While the information for the 100-kd protein 
terminates at nucleotide 890, two ORFs (ORFs 1 and 2) can be distin- 
guished in the other two reading frames, viz., ORF 1 from nucleotides 
306 to 1191 (between stop codons 303 and 1191) and ORF 2 from nu- 
cleotides 1006 to 1492 (between stop codons 1003 and 1492 (Fig. 21). An 
ATG is present at nucleotide 411. Since one of the L4 mRNAs contains 
an internal splice that maps reasonably well in the region where these 
two ORFs overlap, it is likely that these regions code for the 33-kd protein. 
However, this has still to be proved by experimental data. One of the 
three short additional leaders for the fiber mRNA (x-leader) is also tran- 
scribed in this region from the r-strand (77.2-77.6). The x-leader has not 
been sequenced yet, but employing EM mapping data and typical RNA 
splice-site sequences, it has been inferred that this leader is transcribed 
from the r-strand from nucleotides 2215 to 2347. The 1-strand between 
66.5 and 77.3 units codes for the DBP mRNA leaders from positions 75.1, 
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72.0, and 68.8, respectively. The structure of the corresponding TATA 
boxes and individual leaders was described in Section VILD. 



F. Early Region E3 (76.6-86.0) 

This region, located between coordinates 76.6 and 86.0, codes for a 
large number of r-strand transcripts and polypeptides (Fig. 6). At least six 
major species of mRNAs have been identified, coding for proteins of 13, 
14, 15.5-16, and 19-21 kd, respectively (Lewis et al., 1976; Harter et al., 
1976; Green et al., 1979d ; Ross et al., 1980). The polypeptides of 19-21 
kd are glycoproteins, which are associated with the membrane fraction 
(Persson et al., 1979b, 1980a). Tryptic peptide analysis has shown that 
the 16-kd polypeptide is the unglycosylated precursor of the 19-kd protein 
(Persson et al., 1980b). 

The mRNAs from this region share sequences at their 5' ends from 
coordinates 76.6 to 77.6, which are ligated to sequences starting at 78.6 
m.u. The 3' ends of the transcripts may vary. 

Nucleotide sequence analysis of this region has revealed that a TATA 
box of the structure TATAA is located at nucleotide 1947 (76.7 m.u.), 
while transcription starts at nucleotide 1976/1978 (Baker and Ziff, 1981) 
[Fig. 21 (Appendix B)]. In region E3, two polyadenylation sites are present, 
one of which has been mapped at the nucleotide level (nucleotide 4148). 
Examination of the sequence of this region reveals that the sequence 
ATT AAA is found at position 4136. This sequence differs from the com- 
mon hexanucleotide AATAAA that is found in all other Ad2 mRNAs 
associated with the polyadenylation site. In the sequence of region E3, 
the sequence AATAAA is located at nucleotide 5209, which fits very well 
with EM mapping data of some E3 mRNA species. However, for these 
messengers, the polyadenylation site has not yet been determined in de- 
tail. 

The first ATG in the E3 region is found at position 2266, which 
suggests that E3 mRNAs have a 290-nucleotide-long untranslatable re- 
gion at their 5' ends. About 80 nucleotides downstream from this ATG 
lies a potential splice site, and this site fits very well with the position 
where the common leader sequence of E3 mRNAs has been mapped (po- 
sitions 76.6-77.6). This leader sequence may code for 27 amino acid res- 
idues, which would be common to all E3 proteins. However, determi- 
nation of the amino-terminal sequence of the unglycosylated 16-kd 
protein has shown that translation of the coding sequence of this protein 
starts at nucleotide 3179 and continues to nucleotide 3656. This codes 
for a protein of 159 amino acids with a molecular weight of 18.4K. Ob- 
viously, the ATG at position 2266 present in all E3 mRNAs is not rec- 
ognized during translation. If the 3' splice point of the first E3 intervening 
sequence is located around position 2840 (Herisse et al., 1980), this im- 
plies that the mRNA for the 16-kd protein has an untranslated region 
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more than 700 nucleotides long. Region E3 contains a number of short 
URFs. A hypothetical organization of translation is indicated in Fig. 6. 
Unfortunately, no data are available to assign the URFs unambiguously 
to individual proteins. As described above, the only exception is the 16- 
kd protein. The function of the E3 proteins is completely obscure. In some 
adenovirus-simian virus 40 hybrids, this region is absent without af- 
fecting the viability of the virus. Apparently this region is nonessential 
for viral multiplication (for a review, see Tooze, 1981). In addition to the 
E3 proteins, this region codes for two additional leaders of the fiber 
mRNAs, viz., the y-leader (78.6-79.2) and the z-leader (84.7-85.1) (Chow 
and Broker, 1978). Only the y-leader has been sequenced and appears to 
be located at nucleotides 2741-2924 (Zain et al., 1979a). Employing EM 
mapping data and the common sequences of RNA splice sites, it has been 
inferred that the z-leader is located at nucleotides 4805-4963 (Herisse et 
al., 1980). 



G. Late Region L5 (86.0-91.3) 

The L5 family of late transcripts consists of two major mRNA species 
that code for a single virion protein, the fiber (polypeptide IV). The main 
bodies of these RNAs map between coordinates 86.0 and 91.3 (Miller et 
al, 1980) (Fig. 6). RNA from this region differs from all other late mes- 
sengers in that it may contain, in addition to the common tripartite 
leader, additional leader sequences (x, y, and z) from map positions 77.2, 
78.6, and 84.7 (Chow and Broker, 1978; Zain et al., 1979a). The y-leader 
is the most abundant additional leader of fiber mRNA; however, even 
this leader is not present in all RNA species. It has been shown that the 
presence or absence of the y-leader does not influence the translation of 
fiber mRNA. Even in the absence of the y-leader, the mRNA can be 
translated normally to fiber protein in an in vitro translation system 
(Dunn et al., 1978). The nucleotide sequence of this leader has been es- 
tablished to be 184 nucleotides long, and although an ATG is present in 
this sequence, it is obviously not employed and not required for appro- 
priate translation of fiber mRNA. 

The complete nucleotide sequence of region L5 has been established 
(Zain et al., 1979a, Zain and Roberts, 1979; Herisse and Galibert, 1981; 
Herisse et al., 1981; Gingeras et al., 1982) [Fig. 21 (Appendix B)]. The 5' 
end of the main body of the fiber mRNA is located at nucleotide 5395, 
adjacent to the codon of fiber mRNA at position 5397 (Zain and Roberts, 
1979; Zain et al., 1979a). The termination codon of the fiber gene is 
located at nucleotide 7143 and is part of the polyadenylation signal AA- 
TAAA at position 7141. The mRNA codes for 582 amino acid residues 
that contitute a protein with a theoretical molecular weight of 61.9K, 
which agrees very well with the apparent molecular weight of the fiber 
protein of 62K. 
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H. Early Region E4 (91.3-99.2) 

Early region E4 messengers are transcribed from the viral 1-strand 
between coordinates 91.3 and 99.0 and code for a large set of polypeptides 
(Fig. 6). The promoter of this region has been mapped at 99.2 m.u while 
the 3' ends of E4 RNAs have been localized at 91.3 m.u. (Berk and Sharp, 
1978- Chow etal., 1979a,b ; Baker and Ziff, 1981; Hashimoto et ol., 1981). 

All E4 mRNAs share their 5'- and 3'-terminal nucleotide sequences, 
but vary in the location of splice points (Berk and sharp, 1978; Chow et 
d 1979a- Kitchingman and Westphal, 1980). These messengers code for 
a number of polypeptides with molecular weights of 11, 13 17 19 21, 
and 24K (Lewis et ol., 1976; Green et ol., 1979d; Ross et ol., 1980). As 
yet these proteins have not been assigned unambiguously to .individual 
mRNA species. Only the position of the acidic 11K polypeptide has been 
correlated to a specific region in the nucleotide sequence of this region 
(Herisse et ol., 1981). . 

Besides the fact that the synthesis of the E4 proteins starts about 2 
hr after infection, reaches a maximum around 3 hr, and then declines, 
these proteins seem to be nonessential for DNA replication, and their 
role is at present unknown. . 

Recently the complete Ad2 nucleotide sequence of this region nas 
been established (Shinagawa et ol., 1980; Herisse et aZ.,-1981, Gingeras 
et ol., 1982) [Fig. 21 (Appendix B)], while for Ad5, the region between 97 
and 100 m u. has been determined (Steenbergh and Sussenbach, 1979) 
[Fig 25.1 (Appendix B)]. At nucleotide 10,008 in the Ad2 sequence, a 
TATA box with the structure TATATATA can be recognized aspar t : of 
a promoter sequence. Transcription begins with the sequence TTTT1A 
at nucleotides 9981-9976, leading to a heterogeneous array of starts 
(Baker and Ziff, 1981) (Fig. 21). All major species of mRNAs contain a 
leader sequence starting at the cap sites and probably terminating at nu- 
cleotide 9915, where a potential 5' splice site is located. This leader se- 
quence is devoid of ATG able to play a role in initiation of translation. 
Therefore, such a signal should be located in the body of the various 
mRNA species spliced to this leader sequence. At the other end of the 
sequence, transcription terminates close to an AATAAA sequence, which 
is located at position 7188. This is consistent with EM mapping data of 
E4 RNAs It should be pointed out that transcription sometimes proceeds 
beyond this point to coordinate 61.5, leading to the production of a minor 
species of E2a mRNA (see Fig. 1). 

The nucleotide sequence of the E4 region reveals that a large number 
of short URFs are present in all three reading frames. 

Comparison of the nucleotide sequence and the mRNA mapping data 
indicates that there is a reasonably good correlation between the mapping 
data and potential donor and acceptor splice sites in the sequence. From 
the predicted structure of the various spliced mRNA species, a hypo- 
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thetical translation pattern has been proposed (Herisse et al., 1981; Gin- 
geras et al., 1982). However, only in the case of the acidic UK protein 
could its coding region be deduced with reasonable certainty from the 
nucleotide sequence to be located in URF 23. Further nucleotide sequence 
analysis of mRNAs and translation of individual mRNA species is re- 
quired to determine unambiguously the relationship between individual 
RNAs and the corresponding proteins. 

I. Unidentified Reading Frames 

In addition to the URFs of early region E4, an additional ORF with 
a coding capacity of 12kd (ORF 3) is found in the viral 1-strand transcripts 
(Fig. 6). This region is located between stop codons at positions 7193 and 
6902 and starts with AAA (7190) (Fig. 21). At nucleotide 7166, the first 
ATG codon is found, while at nucleotide 6323, even the sequence AT- 
TAAA is present, which resembles an aberrant type of polyadenylation 
signal also present in early region E3. It should be noted that although 
the major E4 transcription termination site has been mapped at 91 .3 m.u., 
Nevins et al. (1980) have calculated that transcription termination takes 
place at 88.4 m.u., which corresponds very well with the sequence AT- 
TAAA at nucleotide 6323 (Herisse et al., 1981). However, no mRNA 
species derived from this region are currently known. The same holds for 
two URFs in r-strand transcripts that code for proteins with theoretical 
molecular weights of 10.6 and 12K (URFs 26 and 27). 

VHI. COMPARISON OF GENOMES AND CONCLUDING 
REMARKS 

The organization of the adenovirus genome as described in Section 
VII has mainly been restricted to Ad2 because the most detailed infor- 
mation is available for this serotype. However, it should be emphasized 
that for all serotypes the structure of which has been investigated, the 
same overall organization has been observed. For a number of serotypes, 
nucleotide sequence data are available. These data are compiled in Ap- 
pendix B, including the analysis of these sequences. For a number of genes, 
the nucleotide sequences have been compared, as well as the amino acid 
sequences of the corresponding proteins. Van Ormondt et al. (1980b) have 
analyzed the homology among the Ela regions of Ad5, Ad7, and Adl2, 
while Bos et al. (1981) and Kimura et al. (1981) have studied the homology 
of the Elb regions of Ad5 and Adl2. The rVa 2 and polypeptide IX genes 
of Ad2, Ad3, Ad5, and Ad7 have been compared (Dijkema et al., 1981; 
Engler, 1981; Engler and van Bree, 1982), as well as the late leaders of 
Ad2, Ad3, and Ad7 (Engler et al., 1981) and the E2b regions of Ad2 and 
Ad7 (Engler et al., 1983). The redundancies of different serotypes were 
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analyzed by Tolun et al. [1979] and Shinagawa and Padmanabhan (1980), 
wbile the DNA-binding protein genes of Ad2, Ad5, and Adl2 were com- 
pared by Kruijer et al. (1981, 1982, 1983). 

Detailed analysis of the organization of the adenovirus genome re- 
veals that the available coding information of this virus is used in a very 
economical fashion. Unraveling of the information at the nucleotide level 
reveals all kinds of peculiar properties in its organization. There are 
spliced and unspliced mRNA species (e.g., hexon and polypeptide IX 
RNA), overlapping termination codons and AATAAA signals (e.g., fiber 
and IVa 2 RNA), overlapping genes (e.g., the 33- and 100-kd proteins), and 
symmetrical transcription ( 120-kd protein and the 16-kd i-leader product). 
There are classic TATA boxes (e.g., Ela proteins) and polyadenylation 
signals (AATAAA) (hexon RNA) and aberrant sequences with the same 
function [TATA box TCCTT (E2a early promoter) and polyadenylation 
signal ATTAAA (region E3)]. . 

In conclusion, the adenovirus genome is a microuniverse m itseir, 
and the study of its organization and regulation of expression is a great 
joy and satisfaction for every scientist who dedicates herself or himself 
to the unraveling of its secrets. 
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APPENDIX A: RESTRICTION ENDONUCLEASE CLEAVAGE 
MAPS 

This appendix contains a compilation of restriction maps of the ge- 
nomes of different adenovirus serotypes (Figs. 7-17). These maps have 
partially been published and partially been presented as personal com- 
munications. Most of these maps have been compiled before by Tooze 
(1981) and are redrawn with permission from the Cold Spring Harbor 
Laboratory Publication Department. The coordinates of the Adl, Ad2, 
and Ad5 maps have been recalculated (Gingeras et al., 1982). Details on 
the origin of the maps are indicated in Tooze (1981), unless otherwise 
stated. 



THE STRUCTURE OF THE GENOME 6S 
1 » . '. . L£_J §_ 

' «-* j. ■ ■ i ,%!.°,Lm:c 

i-FJ i IhJ a L£L £ TejIl d M 

"j . • jF — + — 5 r--ji 

1 F I D I § 1 1 I J I | | A |G/H| C %/H | K 

A o ,'o a ' 0 3'o~7o. K .'o 7 'o s'o .'o ,io 

1 * 1 S 1 g T° 

• — — X • .i. ■ 1 5 

1 -HJ § 1 £ LU A ) E | d '"fig 

* 1 s .!, ' ih ° • if 

1 J ' E U F 1 § LU D IHI A IU C I G K 

B o Si 3 3 — JS — 5 — « — 3 — S — 7* — ?o„ 

FIGURE 7A-D. Restriction endonuclease cleavage maps of Group C Adl, Ad2, and Ad5. 
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FIGURE 8. Restriction endonuclease cleavage maps of Group C Ad6. The maps were de- 
termined by Naroditsky et al. (1980) and oriented such that the transforming region is 
located at the left. The EcoRI map was determined by Forsblom et al. [1976). 





FIGURE 9 A, B. Restriction endonuclease cleavage maps of Group B Ad3 and Ad7. The BstEII 
and Bell maps were determined by R. Padmanabhan (personal communication). 
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FIGURE 11. Restriction endonuclease cleavage maps of Group A Adl2 (Huie). 
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FIGURE 12. Restriction endonuclease cleavage maps of Group A Ad31 (strain 1315). The 
maps were determined by Y. Sawada, Y. Yamashita, F. Kamda, K. Sekikawa, and K. Fujinaga 
(personal communication). 
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FIGURE 13. Restriction endonuclease cleavage maps of Group E Ad4. These maps were 
deteraiiried by Tokunaga et oi..(1982). 



19.1 24.6 26.B 37.7 
C 1 E 1FI D L 



A0 10 20 30 40 50 60 70 

FIGURE 14A B. Restriction endomiclease cleavage maps of simian adenovirus type 7. The 
£coRI, Sail, and BglR maps of simian adenovirus (strain C8) were determined by Narodrtsky 
et al (1980) and oriented with respect to the conventional genetic map by Ponomareva et 
al. (1979), who located the transforming region to the left. The other maps were determined 
by T. I. Tikchonenko and colleagues (personal communication). 
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FIGURE 15. Restriction endonuclease cleavage maps of simian adenovirus type 20. These 
maps were determined by T. I. Tikchonenko and colleagues (personal communication). 
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FIGURE 16. Restriction endonuclease cleavage maps of simian ad e ™™is typ e 3a The 
EcoRI and Bgffl maps were determined by Dimitrov e£ al. 1979). They were onpnaUy 
reported to be those of simian adenovirus type 38, and ^^^f^^^ 
by Tikchonenko and colleagues (personal communication), who also determined the other 
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FIGURE 17A, B. Restriction endonuclease cleavage maps of ^^^^^J^ 
These maps were determined by Larsen et al. (1979). For the onentation, see Larsen et al. 
(1979) and Temple et al. (1981). 
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APPENDIX B: NUCLEOTIDE SEQUENCES 

This appendix contains a compilation of nucleotide sequences par- 
tially published and partially presented as personal communications 
(Figs. 18-29). Since r-strand transcripts are homologous to the 1-strand, 
the positions of important landmarks for r-strand transcripts are indicated 
in the 1-strand sequence. Likewise, strategic sequences for 1-strand tran- 
scripts are indicated in the r-strand sequence. The sequences of Ad2 and 
Ad5 are very homologous. Therefore, it has been supposed that specific 
signals identified in the sequence of one serotype also indicate the po- 
sitions of these signals in the sequence of the other serotype. The posi- 
tions of the inverted terminal repetition boundaries and start and ter- 
mination codons of known coding regions are indicated, as well as the 
positions of 5' and 3' ends of mRNAs, splice points, and TATA boxes. 
The latter signals are supposed to be a constitutive part of transcriptional 
promoters. The sequences AATAAA and ATTAAA, which are found 
within about 30 nucleotides from the 3' end of the mRNAs, are under- 
lined. These sequences have been associated with polyadenylation. Open 
reading frames (ORFs), defined as regions between two termination co- 
dons in the same frame, have been indicated when the size exceeds 300 
nucleotides. The same holds for unidentified reading frames (URFs) (re- 
gions that start with an ATG codon and terminate with one of the ter- 
mination codons). 
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FIGURE 18A-L. Nucleotide sequence of ^^^^^^^^^J^J^k^^m et'oZ^U^^j 
DNA. This sequence was determined ^by Gingera , et £ determine d 
(nucleotides 5776-11,558). ^^^^^^l^pdi in their Ada 
by Gingeras and co-workers, most other investigators identical. To allow 

strains. In the latter case, the termmal sequences of Ad2 and ^ d 
compansonof Ad2andAd5 sequences '^^^"^S^ ^oiwoto 

and VII.B. 
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FIGURE 18 [Continued) 
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FIGURE 18 (Continued) 



PTCT IRE 19 Nucleotide sequence of a region between coordinates 49.0 and 51 .8 on the Ad2 
gnorSxLset^ 

strategic signals were determined by Akusjarvi and Pettersson [1979a) and Akus,arvi and 
Persson (1981a). 
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FIGURE 20.2. Structural organization of a region between coordinates 59.5 and 66.4 on the 
Ad2 genome. This map is derived from the nucleotide sequence in Fig. 20.1. For explanation 
of the symbols, see the Fig. 3 caption (Section VII). 




FIGURE 21A-K. Nucleotide sequence of a region between coordinates 70.7 and 100.0 on 
the Ad2 genome. This sequence was established by Galibert et al. (1979], Herisse et al. 
(1980), and Herisse and Galibert (1981). Short sequences were also determined by Zain e£ 
al. (1979a,b), Zain and Roberts ( 1979], Baker and Ziff (1980, 1981 ), Arrand and Roberts (1979), 
and Shinagawa et al. (1980). The region between 89.5 and 100 was also determined by 
Gingeras et al. (1982). 
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FIGURE 21 (Continued) 
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FIGURE 21 [Continued] 
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FIGURE 23.1A-L. Nucleotide sequence of a region between coordinates 0.0 and 31.7 on 
the Ad5 genome. This sequence was established by Steenbergh et al. (1977), van Ormondt 
et al. (1978), Maat and van Ormondt (1979), Maat et al. (1980), van Beveren et al. (1981), 
Bos e£ al. (1981), and H. van Ormondt and B. M. M. Dekker (personal communication). For 
interpretations, see van der Eb e£ al. (1979) and van Ormondt et al. (1980a,b). 
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FIGURE 23.1 (Continued] 
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FIGURE 23.1 [Continued) 
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FIGURE 23.1 [Continued) 
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FIGURE 23.1 [Continued) 
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FIGURE 23.2A-C. Structural organization of a region between coordinates 0.0 and 31.7 on 
the Ad5 genome. This map is derived from the nucleotide sequence in Fig. 23.1. For the 
positioning of strategic signals, see Fig. 3 (Section VII). 
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FIGURE 24A-D. I» 

the Ad5 genome. This sequence and the positions of splice points and leaders w 
mined by Kruijer et al. (1980, 1981, 1983). A schematic presentation of this se 
shown in Fig. 5 (Section VII). 
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FIGURE 24 [Continued) 




FIGURE 25.1. Nucleotide sequence of a region between coordinates 97.0 and 100.0 on the 
Ad5 genome. This sequence was determined by Steenbergh et al. (1977) and Steenbergh and 
Sussenbach (1979). The strategic sequences were determined by Baker and Ziff (1980, 1981) 
and further derived from the Ad2 sequence of this region (Fig. 21). 
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FIGURE 25.2. Structural organization of a region between coordinates 97.0 and 100.0 on 
the Ad5 genome. This map is derived from the nucleotide sequence in Fig. 25.1. 
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FIGURE 26.1A-K. Nucleotide sequence of a region between coordinates 0 and 31.7 on the 
Ad7 genome. This sequence and the positions of strategic sequences were established by 
Dijkema and Dekker (1979), Dijkema et al. (1980a,b, 1981, 1982), van Beveren et al. (1981), 
Engler (1981), Engler et al. (1981, 1983), and Engler and van Bree (1982). 
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FIGURE 26.1 [Continued) 



THE STRUCTURE OF THE GENOME 103 



FIGURE 26.1 [Continued) 
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FIGURE 26.1 [Continued] 
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FIGURE 26.2A-C. Structural organization of a region between coordinates 0 and 31.7 on 
the Ad7 genome. This map is derived from the nucleotide sequence in Fig. 26.1. For details, 
see Fig. 3 (Section VII). 
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FIGURE 27.1A-D. Nucleotide sequence of a region between coordinates 0.0 and 11.5 on 
the Adl2 genome. This sequence and strategic signals were established by Fujinaga et al. 
(1979), Sugisalca et al. (1980), Kimura et al. (1981), and Bos et al. (1981). 
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FIGURE 27.2. Structural organization of a region between coordinates 0.0 and 11.5 on the 
Adl2 genome. This map is derived from the nucleotide sequence in Fig. 27.1. 



FIGURE 28.1A, B. Nucleotide sequence of a region between coordinates 61.5 and 67.0 on 
the Adl2 genome. This sequence was established by Kruijer et al. (1983). 
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FIGURE 28.2. Structural organization of a region betwee 
Adl2 genome. This map is derived from the nucleotide 



:es 61.5 and 67.0 on the 
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FIGURE 29 A B. Nucleotide sequence of inverted terminal repetitions. The origins of the 
sequences are as follows: (A) Ad3: Tolun et al. (1979). Ad4: Tokunaga et al. (1982). Ad2/ 
Ad5- The Ad2 sequence was determined by Shinagawa and Padmanabhan (1979) and the 
Ad5 sequence by Steenbergh et al. (1977). The two sequences are identical. Arrand and 
Roberts (1979) have analyzed an Ad2 strain that missed base pair 9 ( t )■ Ad7: These sequences 
were determined for strain Gomen by Dijkema and Dekker (1979) (a) and for strain Greider 
by Shinagawa and Padmanabhan (1980) (b). The differences between the sequences are in- 
dicated. (B) Adl2: Tolun et al. (1979) (a), Sugisaka et al. (1980) (a), Shinagawa and Padman- 
abhan (1980) (b), and Schwarz et al. (1982) (c). The differences between the sequences are 
indicated. Adl8: Garon et al. (1982). CELO: Alestrom et al. (1982a). FL: Temple et al. (1981). 
In the human sequences, the conserved sequences 9-22 are underlined; the homologous 
regions in CELO and FL DNA are indicated by dashed underlines. The common sequence 
TGACGT discovered by Shinagawa and Padmanabhan (1980) is underlined. 
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